Overview

Dataset statistics

Number of variables22
Number of observations45215
Missing cells136534
Missing cells (%)13.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.6 MiB
Average record size in memory176.0 B

Variable types

Categorical7
Numeric8
DateTime7

Alerts

Matricula has a high cardinality: 1822 distinct values High cardinality
NomeMotorista has a high cardinality: 2695 distinct values High cardinality
Tara is highly correlated with brutoHigh correlation
bruto is highly correlated with Tara and 2 other fieldsHigh correlation
Liquido is highly correlated with bruto and 1 other fieldsHigh correlation
qtdpedida is highly correlated with bruto and 1 other fieldsHigh correlation
Tara is highly correlated with brutoHigh correlation
bruto is highly correlated with Tara and 2 other fieldsHigh correlation
Liquido is highly correlated with bruto and 1 other fieldsHigh correlation
qtdpedida is highly correlated with bruto and 1 other fieldsHigh correlation
Tara is highly correlated with brutoHigh correlation
bruto is highly correlated with Tara and 1 other fieldsHigh correlation
Liquido is highly correlated with qtdpedida and 1 other fieldsHigh correlation
qtdpedida is highly correlated with bruto and 1 other fieldsHigh correlation
percDiff is highly correlated with LiquidoHigh correlation
DescProduto is highly correlated with TipoViaturaHigh correlation
TipoViatura is highly correlated with DescProduto and 1 other fieldsHigh correlation
PostoOperacao is highly correlated with TipoViaturaHigh correlation
TipoDoc is highly correlated with Tara and 3 other fieldsHigh correlation
TipoViatura is highly correlated with DescProduto and 1 other fieldsHigh correlation
CodProduto is highly correlated with DescProduto and 3 other fieldsHigh correlation
DescProduto is highly correlated with TipoViatura and 5 other fieldsHigh correlation
Tara is highly correlated with TipoDoc and 4 other fieldsHigh correlation
bruto is highly correlated with TipoDoc and 7 other fieldsHigh correlation
Liquido is highly correlated with CodProduto and 4 other fieldsHigh correlation
qtdpedida is highly correlated with TipoDoc and 5 other fieldsHigh correlation
percDiff is highly correlated with TipoDoc and 2 other fieldsHigh correlation
PostoOperacao is highly correlated with TipoViatura and 1 other fieldsHigh correlation
CodMotorista is highly correlated with brutoHigh correlation
PostoOperacao has 16288 (36.0%) missing values Missing
TaraData has 26694 (59.0%) missing values Missing
DataInicioOperacao has 26694 (59.0%) missing values Missing
DataFimOperacao has 31522 (69.7%) missing values Missing
CodMotorista has 35326 (78.1%) missing values Missing
DataCriacao has unique values Unique
Dataentrada has unique values Unique
percDiff has 760 (1.7%) zeros Zeros

Reproduction

Analysis started2022-05-13 16:01:56.616308
Analysis finished2022-05-13 16:02:12.930948
Duration16.31 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

TipoDoc
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
TP
45055 
SP
 
160

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters90430
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTP
2nd rowTP
3rd rowTP
4th rowTP
5th rowTP

Common Values

ValueCountFrequency (%)
TP45055
99.6%
SP160
 
0.4%

Length

2022-05-13T17:02:16.565725image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-13T17:02:16.657746image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
tp45055
99.6%
sp160
 
0.4%

Most occurring characters

ValueCountFrequency (%)
P45215
50.0%
T45055
49.8%
S160
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter90430
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P45215
50.0%
T45055
49.8%
S160
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin90430
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
P45215
50.0%
T45055
49.8%
S160
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII90430
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
P45215
50.0%
T45055
49.8%
S160
 
0.2%

TipoViatura
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
Z002
28227 
Z004
16988 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters180860
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowZ004
2nd rowZ002
3rd rowZ002
4th rowZ002
5th rowZ004

Common Values

ValueCountFrequency (%)
Z00228227
62.4%
Z00416988
37.6%

Length

2022-05-13T17:02:16.736730image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-13T17:02:16.839722image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
z00228227
62.4%
z00416988
37.6%

Most occurring characters

ValueCountFrequency (%)
090430
50.0%
Z45215
25.0%
228227
 
15.6%
416988
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number135645
75.0%
Uppercase Letter45215
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
090430
66.7%
228227
 
20.8%
416988
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
Z45215
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common135645
75.0%
Latin45215
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
090430
66.7%
228227
 
20.8%
416988
 
12.5%
Latin
ValueCountFrequency (%)
Z45215
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII180860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
090430
50.0%
Z45215
25.0%
228227
 
15.6%
416988
 
9.4%

CodProduto
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.545947141
Minimum2
Maximum42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:16.913725image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q12
median3
Q35
95-th percentile37
Maximum42
Range40
Interquartile range (IQR)3

Descriptive statistics

Standard deviation8.446174665
Coefficient of variation (CV)1.522945396
Kurtosis12.83874307
Mean5.545947141
Median Absolute Deviation (MAD)1
Skewness3.779239051
Sum250760
Variance71.33786648
MonotonicityNot monotonic
2022-05-13T17:02:16.999723image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
314463
32.0%
514237
31.5%
212276
27.2%
421129
 
2.5%
6967
 
2.1%
40871
 
1.9%
11450
 
1.0%
37429
 
0.9%
7393
 
0.9%
ValueCountFrequency (%)
212276
27.2%
314463
32.0%
514237
31.5%
6967
 
2.1%
7393
 
0.9%
11450
 
1.0%
37429
 
0.9%
40871
 
1.9%
421129
 
2.5%
ValueCountFrequency (%)
421129
 
2.5%
40871
 
1.9%
37429
 
0.9%
11450
 
1.0%
7393
 
0.9%
6967
 
2.1%
514237
31.5%
314463
32.0%
212276
27.2%

DescProduto
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size353.4 KiB
CIMENT II/AL 32,5 R PALETTE
14463 
CIMENT I 42,5 R SAC
14237 
CIMENT II/AL 32,5 R SAC
12276 
CIMENT I 42,5 R SR3 PALETTE
 
1128
CIMENT I 42,5 R PALETTE
 
967
Other values (4)
2143 

Length

Max length28
Median length27
Mean length23.08342549
Min length17

Characters and Unicode

Total characters1043694
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCIMENT II/AL 32,5 R PALETTE
2nd rowCIMENT I 42,5 R SAC
3rd rowCIMENT I 42,5 R SAC
4th rowCIMENT I 42,5 R SAC
5th rowCIMENT I 42,5 R SR3 PALETTE

Common Values

ValueCountFrequency (%)
CIMENT II/AL 32,5 R PALETTE14463
32.0%
CIMENT I 42,5 R SAC14237
31.5%
CIMENT II/AL 32,5 R SAC12276
27.2%
CIMENT I 42,5 R SR3 PALETTE1128
 
2.5%
CIMENT I 42,5 R PALETTE967
 
2.1%
CIMENT II/AL 42,5 N SAC871
 
1.9%
CHAUX CHA 10 SAC450
 
1.0%
CHAUX CHA 10 PALETTE429
 
0.9%
CIMENT I 42,5 R SR3 SAC393
 
0.9%
(Missing)1
 
< 0.1%

Length

2022-05-13T17:02:17.107431image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-13T17:02:17.222458image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
ciment44335
19.6%
r43464
19.2%
sac28227
12.5%
ii/al27610
12.2%
32,526739
11.8%
42,517596
 
7.8%
palette16987
 
7.5%
i16725
 
7.4%
sr31521
 
0.7%
chaux879
 
0.4%
Other values (3)2629
 
1.2%

Most occurring characters

ValueCountFrequency (%)
184291
17.7%
I116280
11.1%
E78309
 
7.5%
T78309
 
7.5%
A74582
 
7.1%
C74320
 
7.1%
N45206
 
4.3%
R44985
 
4.3%
L44597
 
4.3%
544335
 
4.2%
Other values (13)258480
24.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter651174
62.4%
Space Separator184291
 
17.7%
Decimal Number136284
 
13.1%
Other Punctuation71945
 
6.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I116280
17.9%
E78309
12.0%
T78309
12.0%
A74582
11.5%
C74320
11.4%
N45206
 
6.9%
R44985
 
6.9%
L44597
 
6.8%
M44335
 
6.8%
S29748
 
4.6%
Other values (4)20503
 
3.1%
Decimal Number
ValueCountFrequency (%)
544335
32.5%
244335
32.5%
328260
20.7%
417596
 
12.9%
1879
 
0.6%
0879
 
0.6%
Other Punctuation
ValueCountFrequency (%)
,44335
61.6%
/27610
38.4%
Space Separator
ValueCountFrequency (%)
184291
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin651174
62.4%
Common392520
37.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
I116280
17.9%
E78309
12.0%
T78309
12.0%
A74582
11.5%
C74320
11.4%
N45206
 
6.9%
R44985
 
6.9%
L44597
 
6.8%
M44335
 
6.8%
S29748
 
4.6%
Other values (4)20503
 
3.1%
Common
ValueCountFrequency (%)
184291
47.0%
544335
 
11.3%
,44335
 
11.3%
244335
 
11.3%
328260
 
7.2%
/27610
 
7.0%
417596
 
4.5%
1879
 
0.2%
0879
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1043694
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
184291
17.7%
I116280
11.1%
E78309
 
7.5%
T78309
 
7.5%
A74582
 
7.1%
C74320
 
7.1%
N45206
 
4.3%
R44985
 
4.3%
L44597
 
4.3%
544335
 
4.2%
Other values (13)258480
24.8%

estado
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
F
45201 
C
 
14

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters45215
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowF
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
F45201
> 99.9%
C14
 
< 0.1%

Length

2022-05-13T17:02:17.332472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-13T17:02:17.410479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
f45201
> 99.9%
c14
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
F45201
> 99.9%
C14
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter45215
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F45201
> 99.9%
C14
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin45215
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F45201
> 99.9%
C14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII45215
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F45201
> 99.9%
C14
 
< 0.1%

Tara
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1511
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15519.7983
Minimum0
Maximum50580
Zeros160
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:17.491492image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8220
Q114600
median15140
Q315880
95-th percentile19492
Maximum50580
Range50580
Interquartile range (IQR)1280

Descriptive statistics

Standard deviation4155.132945
Coefficient of variation (CV)0.2677311177
Kurtosis11.96684281
Mean15519.7983
Median Absolute Deviation (MAD)640
Skewness1.985005454
Sum701727680
Variance17265129.79
MonotonicityNot monotonic
2022-05-13T17:02:17.682687image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15100553
 
1.2%
14900527
 
1.2%
14800510
 
1.1%
15000500
 
1.1%
15200495
 
1.1%
15300489
 
1.1%
14960462
 
1.0%
14980459
 
1.0%
15240442
 
1.0%
15060440
 
1.0%
Other values (1501)40338
89.2%
ValueCountFrequency (%)
0160
0.4%
15601
 
< 0.1%
17602
 
< 0.1%
17801
 
< 0.1%
18001
 
< 0.1%
18601
 
< 0.1%
18801
 
< 0.1%
24802
 
< 0.1%
25009
 
< 0.1%
252023
 
0.1%
ValueCountFrequency (%)
505801
< 0.1%
493801
< 0.1%
483801
< 0.1%
477801
< 0.1%
474401
< 0.1%
473801
< 0.1%
471001
< 0.1%
470201
< 0.1%
470001
< 0.1%
469401
< 0.1%

bruto
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1432
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43856.38947
Minimum0
Maximum128550
Zeros160
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:17.793763image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21650
Q144150
median44850
Q345650
95-th percentile50600
Maximum128550
Range128550
Interquartile range (IQR)1500

Descriptive statistics

Standard deviation10023.69258
Coefficient of variation (CV)0.2285571771
Kurtosis18.32910749
Mean43856.38947
Median Absolute Deviation (MAD)750
Skewness1.196369788
Sum1982966650
Variance100474412.9
MonotonicityNot monotonic
2022-05-13T17:02:17.921697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44750993
 
2.2%
44850963
 
2.1%
44650957
 
2.1%
44800950
 
2.1%
44700935
 
2.1%
44950914
 
2.0%
44600893
 
2.0%
44500893
 
2.0%
44550880
 
1.9%
44900870
 
1.9%
Other values (1422)35967
79.5%
ValueCountFrequency (%)
0160
0.4%
25501
 
< 0.1%
27501
 
< 0.1%
28001
 
< 0.1%
28501
 
< 0.1%
29001
 
< 0.1%
32501
 
< 0.1%
33001
 
< 0.1%
39001
 
< 0.1%
44001
 
< 0.1%
ValueCountFrequency (%)
1285501
< 0.1%
1265001
< 0.1%
1256501
< 0.1%
1251001
< 0.1%
1250502
< 0.1%
1250001
< 0.1%
1249001
< 0.1%
1248001
< 0.1%
1244002
< 0.1%
1242501
< 0.1%

Liquido
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2408
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28336.6938
Minimum0
Maximum103500
Zeros164
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:18.037697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12050
Q129450
median29760
Q330000
95-th percentile34740
Maximum103500
Range103500
Interquartile range (IQR)550

Descriptive statistics

Standard deviation8480.345425
Coefficient of variation (CV)0.2992708142
Kurtosis17.71922392
Mean28336.6938
Median Absolute Deviation (MAD)270
Skewness1.763006212
Sum1281243610
Variance71916258.53
MonotonicityNot monotonic
2022-05-13T17:02:18.156257image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29850708
 
1.6%
29900703
 
1.6%
29750672
 
1.5%
29800668
 
1.5%
29950626
 
1.4%
29700568
 
1.3%
29810534
 
1.2%
29740514
 
1.1%
29770512
 
1.1%
29790511
 
1.1%
Other values (2398)39199
86.7%
ValueCountFrequency (%)
0164
0.4%
5001
 
< 0.1%
9903
 
< 0.1%
10002
 
< 0.1%
10102
 
< 0.1%
10203
 
< 0.1%
14801
 
< 0.1%
14902
 
< 0.1%
15201
 
< 0.1%
15302
 
< 0.1%
ValueCountFrequency (%)
1035001
< 0.1%
1006501
< 0.1%
999501
< 0.1%
999001
< 0.1%
996501
< 0.1%
995501
< 0.1%
994001
< 0.1%
992501
< 0.1%
989001
< 0.1%
988502
< 0.1%

qtdpedida
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct118
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28778.48502
Minimum500
Maximum105000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:18.276108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum500
5-th percentile13500
Q130000
median30000
Q330000
95-th percentile35000
Maximum105000
Range104500
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8847.772134
Coefficient of variation (CV)0.3074439856
Kurtosis19.01030589
Mean28778.48502
Median Absolute Deviation (MAD)0
Skewness2.312198234
Sum1301219200
Variance78283071.73
MonotonicityNot monotonic
2022-05-13T17:02:18.394636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3000033336
73.7%
150002753
 
6.1%
350002666
 
5.9%
10000741
 
1.6%
25000727
 
1.6%
12000439
 
1.0%
20000422
 
0.9%
5000289
 
0.6%
25500289
 
0.6%
9000273
 
0.6%
Other values (108)3280
 
7.3%
ValueCountFrequency (%)
5001
 
< 0.1%
100010
 
< 0.1%
15007
 
< 0.1%
200016
 
< 0.1%
25008
 
< 0.1%
28003
 
< 0.1%
300070
0.2%
35005
 
< 0.1%
400038
0.1%
45007
 
< 0.1%
ValueCountFrequency (%)
1050001
 
< 0.1%
10000046
 
0.1%
950003
 
< 0.1%
9000094
0.2%
880002
 
< 0.1%
870002
 
< 0.1%
860009
 
< 0.1%
85000144
0.3%
8400017
 
< 0.1%
8300015
 
< 0.1%

percDiff
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1432
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.003166968
Minimum-3.08
Maximum100
Zeros760
Zeros (%)1.7%
Negative9691
Negative (%)21.4%
Memory size353.4 KiB
2022-05-13T17:02:18.515640image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-3.08
5-th percentile-1.133333333
Q10.1
median0.7
Q31.3
95-th percentile2.25
Maximum100
Range103.08
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation6.054100686
Coefficient of variation (CV)6.034988069
Kurtosis256.3706532
Mean1.003166968
Median Absolute Deviation (MAD)0.6
Skewness15.85313716
Sum45358.19448
Variance36.65213511
MonotonicityNot monotonic
2022-05-13T17:02:18.629678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1861
 
1.9%
0.3333333333859
 
1.9%
0.6666666667834
 
1.8%
0.5807
 
1.8%
0760
 
1.7%
0.8333333333700
 
1.5%
0.6677
 
1.5%
0.8663
 
1.5%
0.4652
 
1.4%
0.1666666667645
 
1.4%
Other values (1422)37757
83.5%
ValueCountFrequency (%)
-3.081
 
< 0.1%
-3.0714285711
 
< 0.1%
-3.0666666678
 
< 0.1%
-3.0555555561
 
< 0.1%
-3.0357142861
 
< 0.1%
-3.0333333335
 
< 0.1%
-328
0.1%
-2.9666666674
 
< 0.1%
-2.961
 
< 0.1%
-2.9591836731
 
< 0.1%
ValueCountFrequency (%)
100164
0.4%
2.9047619051
 
< 0.1%
2.9032258061
 
< 0.1%
2.9019607841
 
< 0.1%
2.941
 
0.1%
2.8965517241
 
< 0.1%
2.8888888896
 
< 0.1%
2.8857142863
 
< 0.1%
2.881
 
< 0.1%
2.8753
 
< 0.1%

PostoOperacao
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing16288
Missing (%)36.0%
Memory size353.4 KiB
SPEED2
5855 
SPEED1
5432 
PAL
5017 
AUTOPAC
5008 
PAL2
4056 
Other values (2)
3559 

Length

Max length7
Median length6
Mean length5.126317973
Min length3

Characters and Unicode

Total characters148289
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAL
2nd rowSPEED1
3rd rowSPEED1
4th rowSPEED1
5th rowPAL

Common Values

ValueCountFrequency (%)
SPEED25855
 
12.9%
SPEED15432
 
12.0%
PAL5017
 
11.1%
AUTOPAC5008
 
11.1%
PAL24056
 
9.0%
ENV21805
 
4.0%
ENV31754
 
3.9%
(Missing)16288
36.0%

Length

2022-05-13T17:02:18.743641image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-13T17:02:18.860637image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
speed25855
20.2%
speed15432
18.8%
pal5017
17.3%
autopac5008
17.3%
pal24056
14.0%
env21805
 
6.2%
env31754
 
6.1%

Most occurring characters

ValueCountFrequency (%)
E26133
17.6%
P25368
17.1%
A19089
12.9%
211716
7.9%
S11287
7.6%
D11287
7.6%
L9073
 
6.1%
15432
 
3.7%
U5008
 
3.4%
T5008
 
3.4%
Other values (5)18888
12.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter129387
87.3%
Decimal Number18902
 
12.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E26133
20.2%
P25368
19.6%
A19089
14.8%
S11287
8.7%
D11287
8.7%
L9073
 
7.0%
U5008
 
3.9%
T5008
 
3.9%
O5008
 
3.9%
C5008
 
3.9%
Other values (2)7118
 
5.5%
Decimal Number
ValueCountFrequency (%)
211716
62.0%
15432
28.7%
31754
 
9.3%

Most occurring scripts

ValueCountFrequency (%)
Latin129387
87.3%
Common18902
 
12.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E26133
20.2%
P25368
19.6%
A19089
14.8%
S11287
8.7%
D11287
8.7%
L9073
 
7.0%
U5008
 
3.9%
T5008
 
3.9%
O5008
 
3.9%
C5008
 
3.9%
Other values (2)7118
 
5.5%
Common
ValueCountFrequency (%)
211716
62.0%
15432
28.7%
31754
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII148289
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E26133
17.6%
P25368
17.1%
A19089
12.9%
211716
7.9%
S11287
7.6%
D11287
7.6%
L9073
 
6.1%
15432
 
3.7%
U5008
 
3.4%
T5008
 
3.4%
Other values (5)18888
12.7%

DataCriacao
Date

UNIQUE

Distinct45215
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
Minimum2019-11-29 10:30:00.977000
Maximum2022-05-05 11:19:38.347000
2022-05-13T17:02:18.975769image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:19.106926image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Dataentrada
Date

UNIQUE

Distinct45215
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
Minimum1900-01-01 00:00:00
Maximum2022-05-05 11:20:56.543000
2022-05-13T17:02:19.239927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:19.355008image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

TaraData
Date

MISSING

Distinct18521
Distinct (%)100.0%
Missing26694
Missing (%)59.0%
Memory size353.4 KiB
Minimum2019-11-30 11:44:20.763000
Maximum2022-05-05 11:45:59.750000
2022-05-13T17:02:19.473928image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:19.578930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct18521
Distinct (%)100.0%
Missing26694
Missing (%)59.0%
Memory size353.4 KiB
Minimum2019-11-30 11:44:20.763000
Maximum2022-05-05 11:45:59.750000
2022-05-13T17:02:19.842929image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:19.980928image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

DataFimOperacao
Date

MISSING

Distinct13693
Distinct (%)100.0%
Missing31522
Missing (%)69.7%
Memory size353.4 KiB
Minimum2019-11-30 11:45:59.143000
Maximum2022-05-05 12:32:08.247000
2022-05-13T17:02:20.139933image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:20.293927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct45056
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
Minimum1900-01-01 00:00:00
Maximum2022-05-05 12:36:19.060000
2022-05-13T17:02:20.459926image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:20.607926image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct45212
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
Minimum1900-01-01 00:00:00
Maximum2022-05-05 12:36:21.450000
2022-05-13T17:02:20.755950image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:20.907928image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Matricula
Categorical

HIGH CARDINALITY

Distinct1822
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
117TU139
 
633
8277TU188
 
589
687TU166
 
568
6165TU183
 
489
6289TU143
 
463
Other values (1817)
42473 

Length

Max length10
Median length9
Mean length8.71206458
Min length3

Characters and Unicode

Total characters393916
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique658 ?
Unique (%)1.5%

Sample

1st row9400TU157
2nd row7472TU203
3rd row834TU182
4th row5944TU137
5th row7891TU183

Common Values

ValueCountFrequency (%)
117TU139633
 
1.4%
8277TU188589
 
1.3%
687TU166568
 
1.3%
6165TU183489
 
1.1%
6289TU143463
 
1.0%
9788TU123443
 
1.0%
3133TU173438
 
1.0%
4857TU106422
 
0.9%
5567TU92410
 
0.9%
3904TU216409
 
0.9%
Other values (1812)40351
89.2%

Length

2022-05-13T17:02:21.056262image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
117tu139634
 
1.4%
8277tu188589
 
1.3%
687tu166568
 
1.3%
6165tu183489
 
1.1%
6289tu143463
 
1.0%
9788tu123443
 
1.0%
3133tu173438
 
1.0%
4857tu106422
 
0.9%
5567tu92410
 
0.9%
3904tu216409
 
0.9%
Other values (1807)40350
89.2%

Most occurring characters

ValueCountFrequency (%)
156619
14.4%
T44285
11.2%
U44285
11.2%
233238
8.4%
829912
7.6%
728669
7.3%
328310
7.2%
927900
7.1%
626547
6.7%
525472
6.5%
Other values (6)48679
12.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number305284
77.5%
Uppercase Letter88622
 
22.5%
Lowercase Letter10
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
156619
18.5%
233238
10.9%
829912
9.8%
728669
9.4%
328310
9.3%
927900
9.1%
626547
8.7%
525472
8.3%
024354
8.0%
424263
7.9%
Uppercase Letter
ValueCountFrequency (%)
T44285
50.0%
U44285
50.0%
R26
 
< 0.1%
S26
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
t5
50.0%
u5
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common305284
77.5%
Latin88632
 
22.5%

Most frequent character per script

Common
ValueCountFrequency (%)
156619
18.5%
233238
10.9%
829912
9.8%
728669
9.4%
328310
9.3%
927900
9.1%
626547
8.7%
525472
8.3%
024354
8.0%
424263
7.9%
Latin
ValueCountFrequency (%)
T44285
50.0%
U44285
50.0%
R26
 
< 0.1%
S26
 
< 0.1%
t5
 
< 0.1%
u5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII393916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
156619
14.4%
T44285
11.2%
U44285
11.2%
233238
8.4%
829912
7.6%
728669
7.3%
328310
7.2%
927900
7.1%
626547
6.7%
525472
6.5%
Other values (6)48679
12.4%

CodEntidade
Real number (ℝ≥0)

Distinct359
Distinct (%)0.8%
Missing9
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean398668.9294
Minimum200482
Maximum400589
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:21.188183image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum200482
5-th percentile400017
Q1400046
median400070
Q3400125
95-th percentile400373
Maximum400589
Range200107
Interquartile range (IQR)79

Descriptive statistics

Standard deviation11902.34801
Coefficient of variation (CV)0.02985521852
Kurtosis65.63598048
Mean398668.9294
Median Absolute Deviation (MAD)24
Skewness-8.197186897
Sum1.802222762 × 1010
Variance141665888.1
MonotonicityNot monotonic
2022-05-13T17:02:21.317229image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40004610592
23.4%
4000935940
 
13.1%
4001803228
 
7.1%
4000301655
 
3.7%
4000411276
 
2.8%
4000591253
 
2.8%
4000701094
 
2.4%
400044975
 
2.2%
400091922
 
2.0%
400187791
 
1.7%
Other values (349)17480
38.7%
ValueCountFrequency (%)
2004821
 
< 0.1%
30017321
< 0.1%
3001791
 
< 0.1%
3001819
 
< 0.1%
3001823
 
< 0.1%
3001831
 
< 0.1%
3001854
 
< 0.1%
30018726
0.1%
3001882
 
< 0.1%
3001902
 
< 0.1%
ValueCountFrequency (%)
4005893
 
< 0.1%
4005886
 
< 0.1%
4005872
 
< 0.1%
4005862
 
< 0.1%
4005853
 
< 0.1%
40058431
0.1%
4005834
 
< 0.1%
4004945
 
< 0.1%
4004902
 
< 0.1%
4004881
 
< 0.1%

CodMotorista
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct262
Distinct (%)2.6%
Missing35326
Missing (%)78.1%
Infinite0
Infinite (%)0.0%
Mean9900004980
Minimum9900000018
Maximum9900010784
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size353.4 KiB
2022-05-13T17:02:21.447184image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum9900000018
5-th percentile9900000160
Q19900000984
median9900006045
Q39900006892
95-th percentile9900009452
Maximum9900010784
Range10766
Interquartile range (IQR)5908

Descriptive statistics

Standard deviation3137.19423
Coefficient of variation (CV)3.168881467 × 10-7
Kurtosis-1.185618492
Mean9900004980
Median Absolute Deviation (MAD)1851
Skewness-0.3739206895
Sum9.790114924 × 1013
Variance9841987.637
MonotonicityNot monotonic
2022-05-13T17:02:21.595190image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9900000849348
 
0.8%
9900007766308
 
0.7%
9900006681300
 
0.7%
9900000259298
 
0.7%
9900009452286
 
0.6%
9900003790276
 
0.6%
9900000160274
 
0.6%
9900006779271
 
0.6%
9900000153260
 
0.6%
9900006010249
 
0.6%
Other values (252)7019
 
15.5%
(Missing)35326
78.1%
ValueCountFrequency (%)
99000000186
 
< 0.1%
99000000901
 
< 0.1%
990000012327
 
0.1%
990000014178
 
0.2%
99000001431
 
< 0.1%
9900000153260
0.6%
9900000160274
0.6%
99000001641
 
< 0.1%
99000001678
 
< 0.1%
990000016892
 
0.2%
ValueCountFrequency (%)
990001078417
< 0.1%
99000105071
 
< 0.1%
99000104832
 
< 0.1%
99000104371
 
< 0.1%
99000104322
 
< 0.1%
99000104001
 
< 0.1%
99000103711
 
< 0.1%
99000103251
 
< 0.1%
99000102151
 
< 0.1%
99000101661
 
< 0.1%

NomeMotorista
Categorical

HIGH CARDINALITY

Distinct2695
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size353.4 KiB
AID NSAIBIA
 
474
CHOKRI BEN FRADJ
 
429
NOOMEN SAIDI
 
391
NOUREDINE SAIDI
 
369
LOTFI FARHANI
 
367
Other values (2690)
43185 

Length

Max length33
Median length25
Mean length13.97162446
Min length1

Characters and Unicode

Total characters631727
Distinct characters59
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique950 ?
Unique (%)2.1%

Sample

1st rowHICHEM HAMMAMI
2nd rowAHMED MESSAI
3rd rowanouer jerou
4th rowHAZEM MESSOUDI
5th rowZIED BEN TRAKI

Common Values

ValueCountFrequency (%)
AID NSAIBIA474
 
1.0%
CHOKRI BEN FRADJ429
 
0.9%
NOOMEN SAIDI391
 
0.9%
NOUREDINE SAIDI369
 
0.8%
LOTFI FARHANI367
 
0.8%
MOHAMED AYMEN BEN MANSOUR361
 
0.8%
maher dhif338
 
0.7%
HAYDER FARHAT322
 
0.7%
adel ben amara322
 
0.7%
anis hammami317
 
0.7%
Other values (2685)41525
91.8%

Length

2022-05-13T17:02:21.755208image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ben4953
 
5.0%
mohamed4302
 
4.3%
ali1683
 
1.7%
ahmed1214
 
1.2%
aymen1150
 
1.2%
maher1113
 
1.1%
hayder1030
 
1.0%
mourad1022
 
1.0%
lotfi934
 
0.9%
saidi904
 
0.9%
Other values (1996)80983
81.6%

Most occurring characters

ValueCountFrequency (%)
A80733
 
12.8%
54091
 
8.6%
I53799
 
8.5%
E44386
 
7.0%
M35927
 
5.7%
H34528
 
5.5%
D28819
 
4.6%
R28647
 
4.5%
N22523
 
3.6%
L22453
 
3.6%
Other values (49)225821
35.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter487373
77.1%
Lowercase Letter89615
 
14.2%
Space Separator54091
 
8.6%
Dash Punctuation453
 
0.1%
Other Punctuation186
 
< 0.1%
Decimal Number8
 
< 0.1%
Math Symbol1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A80733
16.6%
I53799
11.0%
E44386
 
9.1%
M35927
 
7.4%
H34528
 
7.1%
D28819
 
5.9%
R28647
 
5.9%
N22523
 
4.6%
L22453
 
4.6%
B21571
 
4.4%
Other values (14)113987
23.4%
Lowercase Letter
ValueCountFrequency (%)
a15708
17.5%
i8261
9.2%
e7862
 
8.8%
m7430
 
8.3%
h7170
 
8.0%
d6018
 
6.7%
r5402
 
6.0%
o4272
 
4.8%
l3887
 
4.3%
b3691
 
4.1%
Other values (13)19914
22.2%
Decimal Number
ValueCountFrequency (%)
72
25.0%
11
12.5%
41
12.5%
21
12.5%
01
12.5%
61
12.5%
91
12.5%
Other Punctuation
ValueCountFrequency (%)
*176
94.6%
.10
 
5.4%
Space Separator
ValueCountFrequency (%)
54091
100.0%
Dash Punctuation
ValueCountFrequency (%)
-453
100.0%
Math Symbol
ValueCountFrequency (%)
+1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin576988
91.3%
Common54739
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A80733
 
14.0%
I53799
 
9.3%
E44386
 
7.7%
M35927
 
6.2%
H34528
 
6.0%
D28819
 
5.0%
R28647
 
5.0%
N22523
 
3.9%
L22453
 
3.9%
B21571
 
3.7%
Other values (37)203602
35.3%
Common
ValueCountFrequency (%)
54091
98.8%
-453
 
0.8%
*176
 
0.3%
.10
 
< 0.1%
72
 
< 0.1%
+1
 
< 0.1%
11
 
< 0.1%
41
 
< 0.1%
21
 
< 0.1%
01
 
< 0.1%
Other values (2)2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII631727
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A80733
 
12.8%
54091
 
8.6%
I53799
 
8.5%
E44386
 
7.0%
M35927
 
5.7%
H34528
 
5.5%
D28819
 
4.6%
R28647
 
4.5%
N22523
 
3.6%
L22453
 
3.6%
Other values (49)225821
35.7%

Interactions

2022-05-13T17:02:10.630124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:03.679303image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.805304image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.709546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.691543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.598920image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.677477image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.625284image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.743123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:03.855827image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.918299image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.837548image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.803690image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.724921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.791157image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.752287image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.861130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:03.986278image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.021558image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.952542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.906545image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.843924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.913158image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.882286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.970639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.123281image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.122542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.073542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.023920image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.103920image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.037286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.024285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:11.084949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.263302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.233546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.203546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.130936image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.218474image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.164288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.158348image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:11.188951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.380303image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.351546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.324544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.243922image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.361472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.283286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.295206image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:11.304951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.579298image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.463542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.440563image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.361938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.475553image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.389286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.408220image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:11.424952image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:04.690300image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:05.576544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:06.575544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:07.484925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:08.579551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:09.508286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-13T17:02:10.520121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-05-13T17:02:21.886182image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-13T17:02:22.044181image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-13T17:02:22.192185image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-13T17:02:22.491182image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-13T17:02:22.644183image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-13T17:02:11.818948image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-13T17:02:12.256033image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-13T17:02:12.574949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-13T17:02:12.746949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TipoDocTipoViaturaCodProdutoDescProdutoestadoTarabrutoLiquidoqtdpedidapercDiffPostoOperacaoDataCriacaoDataentradaTaraDataDataInicioOperacaoDataFimOperacaoBrutoDataDataFechoMatriculaCodEntidadeCodMotoristaNomeMotorista
0TPZ0043CIMENT II/AL 32,5 R PALETTEF142502910014850150001.000000PAL2022-05-05 11:19:38.3472022-05-05 11:20:56.5432022-05-05 11:24:41.6972022-05-05 11:24:41.6972022-05-05 11:42:16.4532022-05-05 11:51:28.4972022-05-05 11:51:30.5909400TU157400494.0NaNHICHEM HAMMAMI
1TPZ0025CIMENT I 42,5 R SACF15600457503015030000-0.500000SPEED12022-05-05 10:51:03.0032022-05-05 10:53:34.4472022-05-05 11:45:59.7502022-05-05 11:45:59.7502022-05-05 12:32:08.2472022-05-05 12:36:19.0602022-05-05 12:36:21.4507472TU203400396.09.900007e+09AHMED MESSAI
2TPZ0025CIMENT I 42,5 R SACF14800450003020030000-0.666667SPEED12022-05-05 10:26:27.2502022-05-05 10:29:35.1172022-05-05 10:55:50.4202022-05-05 10:55:50.4202022-05-05 11:44:29.1672022-05-05 12:09:31.9002022-05-05 12:09:37.177834TU182400376.0NaNanouer jerou
3TPZ0025CIMENT I 42,5 R SACF14800448503005030000-0.166667SPEED12022-05-05 10:07:39.8202022-05-05 10:09:34.7632022-05-05 10:20:37.4332022-05-05 10:20:37.433NaT2022-05-05 10:54:07.2872022-05-05 10:54:11.1875944TU137400588.0NaNHAZEM MESSOUDI
4TPZ00442CIMENT I 42,5 R SR3 PALETTEF7450173009850100001.500000PAL2022-05-05 10:00:56.0972022-05-05 10:04:45.9232022-05-05 10:22:43.4332022-05-05 10:22:43.4332022-05-05 10:28:07.9072022-05-05 10:31:45.0102022-05-05 10:31:50.8377891TU183400401.0NaNZIED BEN TRAKI
5TPZ0043CIMENT II/AL 32,5 R PALETTEF29507900495050001.000000PAL22022-05-05 09:36:07.2372022-05-05 09:42:17.6332022-05-05 09:43:52.8372022-05-05 09:43:52.837NaT2022-05-05 10:16:14.4802022-05-05 10:16:20.1201718TU112400086.0NaNFETHI BOUSSAHA
6TPZ0022CIMENT II/AL 32,5 R SACF152504500029750300000.833333SPEED22022-05-05 09:35:14.0672022-05-05 09:42:56.5872022-05-05 10:06:20.4902022-05-05 10:06:20.4902022-05-05 10:56:51.0072022-05-05 11:00:19.0732022-05-05 11:00:25.933319TU113400086.0NaNNABIL BEN MBAREK
7TPZ0022CIMENT II/AL 32,5 R SACF156504550029850300000.500000AUTOPAC2022-05-05 09:34:20.7532022-05-05 09:36:19.8132022-05-05 10:18:41.5272022-05-05 10:18:41.5272022-05-05 10:55:56.2202022-05-05 10:58:23.1802022-05-05 10:58:28.3508170TU152400046.0NaNAYMEN RHIMI
8TPZ0022CIMENT II/AL 32,5 R SACF150004480029800300000.666667AUTOPAC2022-05-05 09:31:04.6572022-05-05 09:32:15.9402022-05-05 09:37:30.2502022-05-05 09:37:30.2502022-05-05 10:17:48.9332022-05-05 10:21:26.2872022-05-05 10:21:31.8103513TU89400046.0NaNMOHAMED TRABELSI
9TPZ0022CIMENT II/AL 32,5 R SACF7250171509900100001.000000SPEED22022-05-05 08:06:41.3502022-05-05 08:07:31.7702022-05-05 08:10:47.1332022-05-05 08:10:47.1332022-05-05 08:57:19.6072022-05-05 08:58:25.9232022-05-05 08:58:31.9201338TU219400046.0NaNKARIM BEN OUDA

Last rows

TipoDocTipoViaturaCodProdutoDescProdutoestadoTarabrutoLiquidoqtdpedidapercDiffPostoOperacaoDataCriacaoDataentradaTaraDataDataInicioOperacaoDataFimOperacaoBrutoDataDataFechoMatriculaCodEntidadeCodMotoristaNomeMotorista
45205TPZ0043CIMENT II/AL 32,5 R PALETTEF146604460029940300000.200000NaN2019-11-29 17:15:14.7132019-11-29 17:15:47.017NaTNaTNaT2019-11-29 17:49:48.9432019-11-29 17:49:52.0237952TU75300214.0NaNIMED BEN ARFA
45206TPZ0025CIMENT I 42,5 R SACF91002895019850200000.750000NaN2019-11-29 17:06:24.9932019-11-29 17:07:13.823NaTNaTNaT2019-11-29 18:46:56.2932019-11-29 18:47:03.2133264TU117300263.0NaNIBRAHIM MANAI
45207TPZ0043CIMENT II/AL 32,5 R PALETTEF149404465029710300000.966667NaN2019-11-29 16:29:25.2102019-11-29 16:30:03.210NaTNaTNaT2019-11-29 18:25:15.6802019-11-29 18:25:18.6274852TU136300350.0NaNHOSNI AHMED
45208TPZ0043CIMENT II/AL 32,5 R PALETTEF150804470029620300001.266667NaN2019-11-29 16:17:19.6672019-11-29 16:17:54.060NaTNaTNaT2019-11-29 17:24:38.2602019-11-29 17:24:40.7102423TU162300350.0NaNWALID DHIFAWI
45209TPZ0025CIMENT I 42,5 R SACF157804540029620300001.266667NaN2019-11-29 16:11:41.0732019-11-29 16:12:41.897NaTNaTNaT2019-11-29 17:17:25.2902019-11-29 17:17:27.6602168TU154300330.0NaNABDESSALEM KERANI
45210TPZ0022CIMENT II/AL 32,5 R SACF148404455029710300000.966667NaN2019-11-29 16:08:01.5802019-11-29 16:08:42.433NaTNaTNaT2019-11-29 17:22:10.1402019-11-29 17:22:12.3671425TU163300348.0NaNMAJDI KHALIFI
45211TPZ0043CIMENT II/AL 32,5 R PALETTEF155404525029710300000.966667NaN2019-11-29 16:03:04.6632019-11-29 16:04:01.157NaTNaTNaT2019-11-29 16:58:42.2202019-11-29 16:58:44.5878986TU212300271.0NaNCHAABAN HAJRI
45212TPZ0022CIMENT II/AL 32,5 R SACF154204525029830300000.566667NaN2019-11-29 15:49:36.8172019-11-29 15:50:29.987NaTNaTNaT2019-11-29 16:50:01.8202019-11-29 16:50:04.7606774TU135300386.0NaNMOHAMED RAHALI
45213TPZ0043CIMENT II/AL 32,5 R PALETTEF144804440029920300000.266667NaN2019-11-29 10:41:38.2772019-11-29 10:43:05.387NaTNaTNaT2019-11-29 12:20:35.4802019-11-29 12:20:42.6605820TU130300319.0NaNSABER HAFNAOUI
45214TPZ0043CIMENT II/AL 32,5 R PALETTEF150004465029650300001.166667NaN2019-11-29 10:30:00.9772019-11-29 10:33:13.603NaTNaTNaT2019-11-29 11:20:30.4972019-11-29 11:20:40.560620TU112300214.0NaNSALEM MIGHRI